A Comparison of Unsupervised Bilingual Term Extraction Methods Using Phrase-Tables
نویسندگان
چکیده
Automatic bilingual term extraction is essential for providing a consistent bilingual term list for human translators engaged in translating a set of documents. We compare three statisticalmeasures for extracting bilingual terms from a phrase-table built from a parallel corpus. We show that these measures extract different bilingual term candidates and a combination of these measures ranks valid bilingual terms highly.
منابع مشابه
Using word2vec for Bilateral Translation
Word and phrase tables are key inputs to machine translations, but costly to produce. New unsupervised learning methods represent words and phrases in a high-dimensional vector space, and these monolingual embeddings have been shown to encode syntactic and semantic relationships between language elements. The information captured by these embeddings can be exploited for bilingual translation by...
متن کاملBilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation
Bilingual termbanks are important for many natural language processing (NLP) applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. Then, using a Phrase-Based Statistical Machine Translation model, we create a bilingual terminolo...
متن کاملBilingual Data Cleaning for SMT using Graph-based Random Walk
The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality bilingual data tends to produce incorrect translation knowledge and also degrades translation modeling performance. Previous work often used supervised learning methods to filter lowquality data, but a fair amount of human labeled examples are needed which are not easy to obtain. To reduce the re...
متن کاملLearning Better Rule Extraction with Translation Span Alignment
This paper presents an unsupervised approach to learning translation span alignments from parallel data that improves syntactic rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences. Experiments on Chinese-English translation demonstrate improvements over standard methods for tree-to-string and tree-to-tree tr...
متن کاملBuilding a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language
This paper proposes a novel method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT). Given two bilingual lexicons between language pairs Lf–Lp and Lp–Le, we assume these lexicons as parallel corpora. Then, we merge the extracted two phrase tables into one phrase table between Lf and Le. Finally, we construct a phrase-based SMT...
متن کامل